Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Indexing and Searching Mathematics in Digital Libraries

Identifieur interne : 000388 ( Main/Exploration ); précédent : 000387; suivant : 000389

Indexing and Searching Mathematics in Digital Libraries

Auteurs : Petr Sojka [République tchèque] ; Martin Líška [République tchèque]

Source :

RBID : ISTEX:0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8

Abstract

Abstract: This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.

Url:
DOI: 10.1007/978-3-642-22673-1_16


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Indexing and Searching Mathematics in Digital Libraries</title>
<author>
<name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</author>
<author>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-22673-1_16</idno>
<idno type="url">https://api.istex.fr/document/0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000687</idno>
<idno type="wicri:Area/Istex/Curation">000679</idno>
<idno type="wicri:Area/Istex/Checkpoint">000044</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Sojka P:indexing:and:searching</idno>
<idno type="wicri:Area/Main/Merge">000393</idno>
<idno type="wicri:Area/Main/Curation">000388</idno>
<idno type="wicri:Area/Main/Exploration">000388</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Indexing and Searching Mathematics in Digital Libraries</title>
<author>
<name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République tchèque</country>
<wicri:regionArea>Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno</wicri:regionArea>
<placeName>
<settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République tchèque</country>
<wicri:regionArea>Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno</wicri:regionArea>
<placeName>
<settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8</idno>
<idno type="DOI">10.1007/978-3-642-22673-1_16</idno>
<idno type="ChapterID">16</idno>
<idno type="ChapterID">Chap16</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République tchèque</li>
</country>
<region>
<li>Moravie</li>
</region>
<settlement>
<li>Brno</li>
</settlement>
</list>
<tree>
<country name="République tchèque">
<region name="Moravie">
<name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</region>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<name sortKey="Liska, Martin" sort="Liska, Martin" uniqKey="Liska M" first="Martin" last="Líška">Martin Líška</name>
<name sortKey="Sojka, Petr" sort="Sojka, Petr" uniqKey="Sojka P" first="Petr" last="Sojka">Petr Sojka</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000388 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000388 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:0FF53C9AF09F36F0742CA95BA5ECF84248EB2FA8
   |texte=   Indexing and Searching Mathematics in Digital Libraries
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024